A Metric Index for Approximate Text Management

نویسندگان

  • Vlastislav Dohnal
  • Claudio Gennaro
  • Pavel Zezula
چکیده

Text collections of data need not only search support for identical objects, but the approximate matching is even more important. A suitable metric to such a task is the edit distance measure. However, the quadratic computational complexity of edit distance prevents from applying naive storage organizations, such as the sequential search, and more sophisticated search structures must be applied. We have investigated the properties of the D-index to approximate searching and matching in text databases. The experiments confirm a very good performance for retrieving close objects and sub-linear scalability to process large files. Even the similarity joins can be performed efficiently.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-scale similarity data management with distributed Metric Index

Metric space is a universal and versatile model of similarity that can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. In this work, we try to make an important step towards such management system that would be able to scale to data collections of billions of ob...

متن کامل

A Metric Index for Approximate String Matching

We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the occ occurrenc...

متن کامل

Approximate String Matching ? Edgar

We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suux tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us nding the R occurrences of ...

متن کامل

Approximate fixed point theorems for Geraghty-contractions

The purpose of this paper is to obtain necessary and suffcient conditionsfor existence approximate fixed point on Geraghty-contraction. In this paper,denitions of approximate -pair fixed point for two maps Tα , Sα and theirdiameters are given in a metric space.

متن کامل

Some approximate fixed point results for proximinal valued $beta$-contractive multifunctions

In this paper, we prove some approximate fixed point results for proximinal valued $beta$-contractive multifunctions on metric spaces. We show that our results generalize some old fixed point results in the literature.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002